{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 - Sequence to Sequence Learning with Neural Networks\n",
    "\n",
    "\n",
    "## Introduction\n",
    "\n",
    "采用encoder-decoder的架构搭建，总体结构如下图\n",
    "\n",
    "![](assets/seq2seq1.png)\n",
    "\n",
    "原始语句 \"guten morgen\"通过embedding layer (图中黄色) 和 encoder (图中绿色). 我们添加 *start of sequence* (`<sos>`) 以及 *end of sequence* (`<eos>`) 。encoder function可以被概述为:\n",
    "\n",
    "$$h_t = \\text{EncoderRNN}(e(x_t), h_{t-1})$$\n",
    "\n",
    "RNN可以被替换为很多其他网络，例如在image captioning task中，可以使用resnets或者vgg等CNN网络代替；在machine translation task中，可以使用LSTM等替代。\n",
    "\n",
    "decoder funtion可以被写为：\n",
    "\n",
    "$$s_t = \\text{DecoderRNN}(d(y_t), s_{t-1})$$\n",
    "\n",
    "其中$s_{t-1}$表示上一个单词的embedding。即在每一轮迭代中，我们使用上一个单元的输出和原始的encoding作为decoder的输入。\n",
    "\n",
    "decoder的输出往往是单词的embedding，因此还需要一个线性层将embedding映射到one hot layer。可以被写为：\n",
    "\n",
    "$$\\hat{y}_t = f(s_t)$$\n",
    "\n",
    "在训练模型时，我们知道整个句子中有多少单词，因此可以在第N次迭代后停止。在测试时，当decoder的输出为`<eos>`时，停止即可。\n",
    "\n",
    "\n",
    "## 数据准备\n",
    "\n",
    "我们使用Pytorch搭建模型，并使用TorchText中自带的数据集训练。使用spacy中的tokenizer做分词。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "\n",
    "from torchtext.datasets import Multi30k\n",
    "from torchtext.data import Field, BucketIterator\n",
    "\n",
    "import spacy\n",
    "import numpy as np\n",
    "\n",
    "import random\n",
    "import math\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "设置随机种子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "SEED = 1234\n",
    "\n",
    "random.seed(SEED)\n",
    "np.random.seed(SEED)\n",
    "torch.manual_seed(SEED)\n",
    "torch.cuda.manual_seed(SEED)\n",
    "torch.backends.cudnn.deterministic = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\"good morning!\" -------> [\"good\", \"morning\", \"!\"].\n",
    "\n",
    "德语------>英语"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "spacy_de = spacy.load('de_core_news_sm')\n",
    "spacy_en = spacy.load('en_core_web_sm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize_de(text):\n",
    "    \"\"\"\n",
    "    Tokenizes German text from a string into a list of strings (tokens) and reverses it\n",
    "    \"\"\"\n",
    "    return [tok.text for tok in spacy_de.tokenizer(text)][::-1]\n",
    "\n",
    "def tokenize_en(text):\n",
    "    \"\"\"\n",
    "    Tokenizes English text from a string into a list of strings (tokens)\n",
    "    \"\"\"\n",
    "    return [tok.text for tok in spacy_en.tokenizer(text)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "设置起始单词和终止单词"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "SRC = Field(tokenize = tokenize_de, \n",
    "            init_token = '<sos>', \n",
    "            eos_token = '<eos>', \n",
    "            lower = True)\n",
    "\n",
    "TRG = Field(tokenize = tokenize_en, \n",
    "            init_token = '<sos>', \n",
    "            eos_token = '<eos>', \n",
    "            lower = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下载数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "downloading training.tar.gz\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      ".data\\multi30k\\training.tar.gz: 100%|██████████████████████████████████████████████| 1.21M/1.21M [00:04<00:00, 262kB/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "downloading validation.tar.gz\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      ".data\\multi30k\\validation.tar.gz: 100%|███████████████████████████████████████████| 46.3k/46.3k [00:00<00:00, 89.3kB/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "downloading mmt_task1_test2016.tar.gz\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      ".data\\multi30k\\mmt_task1_test2016.tar.gz: 100%|███████████████████████████████████| 66.2k/66.2k [00:00<00:00, 79.5kB/s]\n"
     ]
    }
   ],
   "source": [
    "train_data, valid_data, test_data = Multi30k.splits(exts = ('.de', '.en'), \n",
    "                                                    fields = (SRC, TRG))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of training examples: 29000\n",
      "Number of validation examples: 1014\n",
      "Number of testing examples: 1000\n"
     ]
    }
   ],
   "source": [
    "print(f\"Number of training examples: {len(train_data.examples)}\")\n",
    "print(f\"Number of validation examples: {len(valid_data.examples)}\")\n",
    "print(f\"Number of testing examples: {len(test_data.examples)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分词效果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'src': ['.', 'büsche', 'vieler', 'nähe', 'der', 'in', 'freien', 'im', 'sind', 'männer', 'weiße', 'junge', 'zwei'], 'trg': ['two', 'young', ',', 'white', 'males', 'are', 'outside', 'near', 'many', 'bushes', '.']}\n"
     ]
    }
   ],
   "source": [
    "print(vars(train_data.examples[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "文本预处理，将出现频率低于2的单词去掉"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "SRC.build_vocab(train_data, min_freq = 2)\n",
    "TRG.build_vocab(train_data, min_freq = 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique tokens in source (de) vocabulary: 7854\n",
      "Unique tokens in target (en) vocabulary: 5893\n"
     ]
    }
   ],
   "source": [
    "print(f\"Unique tokens in source (de) vocabulary: {len(SRC.vocab)}\")\n",
    "print(f\"Unique tokens in target (en) vocabulary: {len(TRG.vocab)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "设置gpu或者cpu运行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "BATCH_SIZE = 128\n",
    "\n",
    "train_iterator, valid_iterator, test_iterator = BucketIterator.splits(\n",
    "    (train_data, valid_data, test_data), \n",
    "    batch_size = BATCH_SIZE, \n",
    "    device = device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Encoder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Encoder(nn.Module):\n",
    "    def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.hid_dim = hid_dim\n",
    "        self.n_layers = n_layers\n",
    "        \n",
    "        self.embedding = nn.Embedding(input_dim, emb_dim)\n",
    "        \n",
    "        self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout)\n",
    "        \n",
    "        self.dropout = nn.Dropout(dropout)\n",
    "        \n",
    "    def forward(self, src):\n",
    "        \n",
    "        #src = [src len, batch size]\n",
    "        \n",
    "        embedded = self.dropout(self.embedding(src))\n",
    "        \n",
    "        #embedded = [src len, batch size, emb dim]\n",
    "        \n",
    "        outputs, (hidden, cell) = self.rnn(embedded)\n",
    "        \n",
    "        #outputs = [src len, batch size, hid dim * n directions]\n",
    "        #hidden = [n layers * n directions, batch size, hid dim]\n",
    "        #cell = [n layers * n directions, batch size, hid dim]\n",
    "        \n",
    "        #outputs are always from the top hidden layer\n",
    "        \n",
    "        return hidden, cell"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Decoder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Decoder(nn.Module):\n",
    "    def __init__(self, output_dim, emb_dim, hid_dim, n_layers, dropout):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.output_dim = output_dim\n",
    "        self.hid_dim = hid_dim\n",
    "        self.n_layers = n_layers\n",
    "        \n",
    "        self.embedding = nn.Embedding(output_dim, emb_dim)\n",
    "        \n",
    "        self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout)\n",
    "        \n",
    "        self.fc_out = nn.Linear(hid_dim, output_dim)\n",
    "        \n",
    "        self.dropout = nn.Dropout(dropout)\n",
    "        \n",
    "    def forward(self, input, hidden, cell):\n",
    "        \n",
    "        #input = [batch size]\n",
    "        #hidden = [n layers * n directions, batch size, hid dim]\n",
    "        #cell = [n layers * n directions, batch size, hid dim]\n",
    "        \n",
    "        #n directions in the decoder will both always be 1, therefore:\n",
    "        #hidden = [n layers, batch size, hid dim]\n",
    "        #context = [n layers, batch size, hid dim]\n",
    "        \n",
    "        input = input.unsqueeze(0)\n",
    "        \n",
    "        #input = [1, batch size]\n",
    "        \n",
    "        embedded = self.dropout(self.embedding(input))\n",
    "        \n",
    "        #embedded = [1, batch size, emb dim]\n",
    "                \n",
    "        output, (hidden, cell) = self.rnn(embedded, (hidden, cell))\n",
    "        \n",
    "        #output = [seq len, batch size, hid dim * n directions]\n",
    "        #hidden = [n layers * n directions, batch size, hid dim]\n",
    "        #cell = [n layers * n directions, batch size, hid dim]\n",
    "        \n",
    "        #seq len and n directions will always be 1 in the decoder, therefore:\n",
    "        #output = [1, batch size, hid dim]\n",
    "        #hidden = [n layers, batch size, hid dim]\n",
    "        #cell = [n layers, batch size, hid dim]\n",
    "        \n",
    "        prediction = self.fc_out(output.squeeze(0))\n",
    "        \n",
    "        #prediction = [batch size, output dim]\n",
    "        \n",
    "        return prediction, hidden, cell"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 定义Seq2Seq模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Seq2Seq(nn.Module):\n",
    "    def __init__(self, encoder, decoder, device):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.encoder = encoder\n",
    "        self.decoder = decoder\n",
    "        self.device = device\n",
    "        \n",
    "        assert encoder.hid_dim == decoder.hid_dim, \\\n",
    "            \"Hidden dimensions of encoder and decoder must be equal!\"\n",
    "        assert encoder.n_layers == decoder.n_layers, \\\n",
    "            \"Encoder and decoder must have equal number of layers!\"\n",
    "        \n",
    "    def forward(self, src, trg, teacher_forcing_ratio = 0.5):\n",
    "        \n",
    "        #src = [src len, batch size]\n",
    "        #trg = [trg len, batch size]\n",
    "        #teacher_forcing_ratio is probability to use teacher forcing\n",
    "        #e.g. if teacher_forcing_ratio is 0.75 we use ground-truth inputs 75% of the time\n",
    "        \n",
    "        batch_size = trg.shape[1]\n",
    "        trg_len = trg.shape[0]\n",
    "        trg_vocab_size = self.decoder.output_dim\n",
    "        \n",
    "        #tensor to store decoder outputs\n",
    "        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)\n",
    "        \n",
    "        #last hidden state of the encoder is used as the initial hidden state of the decoder\n",
    "        hidden, cell = self.encoder(src)\n",
    "        \n",
    "        #first input to the decoder is the <sos> tokens\n",
    "        input = trg[0,:]\n",
    "        \n",
    "        for t in range(1, trg_len):\n",
    "            \n",
    "            #insert input token embedding, previous hidden and previous cell states\n",
    "            #receive output tensor (predictions) and new hidden and cell states\n",
    "            output, hidden, cell = self.decoder(input, hidden, cell)\n",
    "            \n",
    "            #place predictions in a tensor holding predictions for each token\n",
    "            outputs[t] = output\n",
    "            \n",
    "            #decide if we are going to use teacher forcing or not\n",
    "            teacher_force = random.random() < teacher_forcing_ratio\n",
    "            \n",
    "            #get the highest predicted token from our predictions\n",
    "            top1 = output.argmax(1) \n",
    "            \n",
    "            #if teacher forcing, use actual next token as next input\n",
    "            #if not, use predicted token\n",
    "            input = trg[t] if teacher_force else top1\n",
    "        \n",
    "        return outputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "INPUT_DIM = len(SRC.vocab)\n",
    "OUTPUT_DIM = len(TRG.vocab)\n",
    "ENC_EMB_DIM = 256\n",
    "DEC_EMB_DIM = 256\n",
    "HID_DIM = 512\n",
    "N_LAYERS = 2\n",
    "ENC_DROPOUT = 0.5\n",
    "DEC_DROPOUT = 0.5\n",
    "\n",
    "enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, N_LAYERS, ENC_DROPOUT)\n",
    "dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, N_LAYERS, DEC_DROPOUT)\n",
    "\n",
    "model = Seq2Seq(enc, dec, device).to(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pytorch 权重初始化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Seq2Seq(\n",
       "  (encoder): Encoder(\n",
       "    (embedding): Embedding(7854, 256)\n",
       "    (rnn): LSTM(256, 512, num_layers=2, dropout=0.5)\n",
       "    (dropout): Dropout(p=0.5, inplace=False)\n",
       "  )\n",
       "  (decoder): Decoder(\n",
       "    (embedding): Embedding(5893, 256)\n",
       "    (rnn): LSTM(256, 512, num_layers=2, dropout=0.5)\n",
       "    (fc_out): Linear(in_features=512, out_features=5893, bias=True)\n",
       "    (dropout): Dropout(p=0.5, inplace=False)\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def init_weights(m):\n",
    "    for name, param in m.named_parameters():\n",
    "        nn.init.uniform_(param.data, -0.08, 0.08)\n",
    "        \n",
    "model.apply(init_weights)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The model has 13,898,757 trainable parameters\n"
     ]
    }
   ],
   "source": [
    "def count_parameters(model):\n",
    "    return sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
    "\n",
    "print(f'The model has {count_parameters(model):,} trainable parameters')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Adam 优化器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = optim.Adam(model.parameters())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "损失函数-----> `CrossEntropyLoss` ，每个单词看作一个类别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "TRG_PAD_IDX = TRG.vocab.stoi[TRG.pad_token]\n",
    "\n",
    "criterion = nn.CrossEntropyLoss(ignore_index = TRG_PAD_IDX)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train(model, iterator, optimizer, criterion, clip):\n",
    "    \n",
    "    model.train()\n",
    "    \n",
    "    epoch_loss = 0\n",
    "    \n",
    "    for i, batch in enumerate(iterator):\n",
    "        \n",
    "        src = batch.src\n",
    "        trg = batch.trg\n",
    "        \n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        output = model(src, trg)\n",
    "        \n",
    "        #trg = [trg len, batch size]\n",
    "        #output = [trg len, batch size, output dim]\n",
    "        \n",
    "        output_dim = output.shape[-1]\n",
    "        \n",
    "        output = output[1:].view(-1, output_dim)\n",
    "        trg = trg[1:].view(-1)\n",
    "        \n",
    "        #trg = [(trg len - 1) * batch size]\n",
    "        #output = [(trg len - 1) * batch size, output dim]\n",
    "        \n",
    "        loss = criterion(output, trg)\n",
    "        \n",
    "        loss.backward()\n",
    "        \n",
    "        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)\n",
    "        \n",
    "        optimizer.step()\n",
    "        \n",
    "        epoch_loss += loss.item()\n",
    "        \n",
    "    return epoch_loss / len(iterator)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型评估"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(model, iterator, criterion):\n",
    "    \n",
    "    model.eval()\n",
    "    \n",
    "    epoch_loss = 0\n",
    "    \n",
    "    with torch.no_grad():\n",
    "    \n",
    "        for i, batch in enumerate(iterator):\n",
    "\n",
    "            src = batch.src\n",
    "            trg = batch.trg\n",
    "\n",
    "            output = model(src, trg, 0) #turn off teacher forcing\n",
    "\n",
    "            #trg = [trg len, batch size]\n",
    "            #output = [trg len, batch size, output dim]\n",
    "\n",
    "            output_dim = output.shape[-1]\n",
    "            \n",
    "            output = output[1:].view(-1, output_dim)\n",
    "            trg = trg[1:].view(-1)\n",
    "\n",
    "            #trg = [(trg len - 1) * batch size]\n",
    "            #output = [(trg len - 1) * batch size, output dim]\n",
    "\n",
    "            loss = criterion(output, trg)\n",
    "            \n",
    "            epoch_loss += loss.item()\n",
    "        \n",
    "    return epoch_loss / len(iterator)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll create a function that we'll use to tell us how long an epoch takes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "def epoch_time(start_time, end_time):\n",
    "    elapsed_time = end_time - start_time\n",
    "    elapsed_mins = int(elapsed_time / 60)\n",
    "    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))\n",
    "    return elapsed_mins, elapsed_secs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型训练与保存"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch: 01 | Time: 0m 56s\n",
      "\tTrain Loss: 5.061 | Train PPL: 157.697\n",
      "\t Val. Loss: 4.840 |  Val. PPL: 126.465\n",
      "Epoch: 02 | Time: 0m 56s\n",
      "\tTrain Loss: 4.489 | Train PPL:  88.997\n",
      "\t Val. Loss: 4.774 |  Val. PPL: 118.410\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-26-3c6802c6222c>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      8\u001b[0m     \u001b[0mstart_time\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      9\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 10\u001b[1;33m     \u001b[0mtrain_loss\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtrain\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtrain_iterator\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0moptimizer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcriterion\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mCLIP\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     11\u001b[0m     \u001b[0mvalid_loss\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mevaluate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalid_iterator\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcriterion\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     12\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m<ipython-input-23-8d78b057d4f4>\u001b[0m in \u001b[0;36mtrain\u001b[1;34m(model, iterator, optimizer, criterion, clip)\u001b[0m\n\u001b[0;32m     27\u001b[0m         \u001b[0mloss\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcriterion\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtrg\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     28\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 29\u001b[1;33m         \u001b[0mloss\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     30\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     31\u001b[0m         \u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnn\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mutils\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mclip_grad_norm_\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mparameters\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mclip\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mD:\\anaconda\\lib\\site-packages\\torch\\tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(self, gradient, retain_graph, create_graph)\u001b[0m\n\u001b[0;32m    196\u001b[0m                 \u001b[0mproducts\u001b[0m\u001b[1;33m.\u001b[0m \u001b[0mDefaults\u001b[0m \u001b[0mto\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;31m`\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;31m`\u001b[0m\u001b[1;33m.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    197\u001b[0m         \"\"\"\n\u001b[1;32m--> 198\u001b[1;33m         \u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    199\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    200\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0mregister_hook\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhook\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mD:\\anaconda\\lib\\site-packages\\torch\\autograd\\__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables)\u001b[0m\n\u001b[0;32m     98\u001b[0m     Variable._execution_engine.run_backward(\n\u001b[0;32m     99\u001b[0m         \u001b[0mtensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgrad_tensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 100\u001b[1;33m         allow_unreachable=True)  # allow_unreachable flag\n\u001b[0m\u001b[0;32m    101\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    102\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "N_EPOCHS = 10\n",
    "CLIP = 1\n",
    "\n",
    "best_valid_loss = float('inf')\n",
    "\n",
    "for epoch in range(N_EPOCHS):\n",
    "    \n",
    "    start_time = time.time()\n",
    "    \n",
    "    train_loss = train(model, train_iterator, optimizer, criterion, CLIP)\n",
    "    valid_loss = evaluate(model, valid_iterator, criterion)\n",
    "    \n",
    "    end_time = time.time()\n",
    "    \n",
    "    epoch_mins, epoch_secs = epoch_time(start_time, end_time)\n",
    "    \n",
    "    if valid_loss < best_valid_loss:\n",
    "        best_valid_loss = valid_loss\n",
    "        torch.save(model.state_dict(), 'tut1-model.pt')\n",
    "    \n",
    "    print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')\n",
    "    print(f'\\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')\n",
    "    print(f'\\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| Test Loss: 3.896 | Test PPL:  49.208 |\n"
     ]
    }
   ],
   "source": [
    "model.load_state_dict(torch.load('tut1-model.pt'))\n",
    "\n",
    "test_loss = evaluate(model, test_iterator, criterion)\n",
    "\n",
    "print(f'| Test Loss: {test_loss:.3f} | Test PPL: {math.exp(test_loss):7.3f} |')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
